An XML-based Tool for Tracking English Inclusions in German Text
نویسندگان
چکیده
The use of lexicons and corpora advances both linguistic research and performances of current natural language processing (NLP) systems. We present a tool that exploits such resources, specifically English and German lexical databases and the World Wide Web to recognise English inclusions in German newspaper articles. The output of the tool can assist lexical resource developers in monitoring changing patterns of English inclusion usage. The corpus used for the classification covers three different domains. We report the classification results and illustrate their value to linguistic and NLP research.
منابع مشابه
Investigating Prosodic Modifications for Polyglot Text-to-Speech Synthesis
This paper investigates the need for applying English prosody when synthesising English portions of mixed English/German texts using a German-based polyglot text-to-speech (TTS) synthesis system. The polyglot system is based on a monolingual German TTS system, which uses a phone mapping from English to German to synthesise English texts. Two systems with varying degrees of assimilation to Engli...
متن کاملAn Unsupervised System for Identifying English Inclusions in German Text
We present an unsupervised system that exploits linguistic knowledge resources, namely English and German lexical databases and the World Wide Web, to identify English inclusions in German text. We describe experiments with this system and the corpus which was developed for this task. We report the classification results of our system and compare them to the performance of a trained machine lea...
متن کاملAutomatic detection of English inclusions in mixed-lingual text with an application to parsing
The influence of English continues to grow to the extent that its expressions have begun to permeate the original forms of other languages. It has become more acceptable, and in some cases fashionable, for people to combine English phrases with their native tongue. This language mixing phenomenon typically occurs initially in conversation and subsequently in written form. In fact, there is evid...
متن کاملIntegrating Language Knowledge Resources to Extend the English Inclusion Classifier to a New Language
This paper presents an unsupervised system that classifies English inclusions in written text. It will demonstrate that extending this English inclusion classifier, which was originally designed for German, requires minimal time and effort to adapt to a new language, in this case French. The analysis of several evaluation experiments carried out on French and German data shows that the system p...
متن کاملBilingual Information Retrieval with HyREX and Internet Translation Services
HyREX is the Hypermedia Retrieval Engine for XML . Its extensibility is based on the implementation of physical data independence; its query interface on the conceptual level consists of data types with respective vague search predicates. This concept enabled us to add search predicates for the data type text for doing bilingual text retrieval. Our implementation uses free Internet resources fo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004